{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 60分钟入门深度学习工具-PyTorch（二、Autograd: 自动求导）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**作者**：Soumith Chintala\n",
    "\n",
    "原文翻译自：https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html\n",
    "    \n",
    "中文翻译、注释制作：黄海广\n",
    "\n",
    "github：https://github.com/fengdu78\n",
    "\n",
    "代码全部测试通过。\n",
    "\n",
    "配置环境：PyTorch 1.0，python 3.6，\n",
    "\n",
    "主机：显卡：一块1080ti；内存：32g（注：绝大部分代码不需要GPU）\n",
    "![公众号](images/gongzhong.jpg)\n",
    "### 目录\n",
    "* 1.[Pytorch是什么？](60分钟入门PyTorch-1.PyTorch是什么？.ipynb)\n",
    "* 2.[AUTOGRAD](60分钟入门PyTorch-2.AUTOGRAD.ipynb)\n",
    "* 3.[神经网络](60分钟入门PyTorch-3.神经网络.ipynb)\n",
    "* 4.[训练一个分类器](60分钟入门PyTorch-4.训练一个分类器.ipynb)\n",
    "* 5.[数据并行](60分钟入门PyTorch-5.数据并行.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 二、Autograd: 自动求导(automatic differentiation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyTorch 中所有神经网络的核心是`autograd`包.我们首先简单介绍一下这个包,然后训练我们的第一个神经网络.\n",
    "\n",
    "`autograd`包为张量上的所有操作提供了自动求导.它是一个运行时定义的框架,这意味着反向传播是根据你的代码如何运行来定义,并且每次迭代可以不同.\n",
    "\n",
    "接下来我们用一些简单的示例来看这个包:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 张量(Tensor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`torch.Tensor`是包的核心类。如果将其属性`.requires_grad`设置为True，则会开始跟踪其上的所有操作。完成计算后，您可以调用`.backward()`并自动计算所有梯度。此张量的梯度将累积到`.grad`属性中。\n",
    "\n",
    "要阻止张量跟踪历史记录，可以调用`.detach()`将其从计算历史记录中分离出来，并防止将来的计算被跟踪。\n",
    "\n",
    "要防止跟踪历史记录（和使用内存），您还可以使用torch.no_grad()包装代码块：在评估模型时，这可能特别有用，因为模型可能具有`requires_grad = True`的可训练参数，但我们不需要梯度。\n",
    "\n",
    "还有一个类对于autograd实现非常重要 - Function。\n",
    "\n",
    "Tensor和Function互相连接并构建一个非循环图构建一个完整的计算过程。每个张量都有一个`.grad_fn`属性，该属性引用已创建Tensor的Function（除了用户创建的Tensors  - 它们的`grad_fn`为`None`）。\n",
    "\n",
    "如果要计算导数，可以在Tensor上调用`.backward()`。如果Tensor是标量（即它包含一个元素数据），则不需要为`backward()`指定任何参数，但是如果它有更多元素，则需要指定一个梯度参数，该参数是匹配形状的张量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "创建一个张量并设置`requires_grad = True`以跟踪它的计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 1.],\n",
      "        [1., 1.]], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "x = torch.ones(2, 2, requires_grad=True)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在张量上执行操作:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[3., 3.],\n",
      "        [3., 3.]], grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "source": [
    "y = x + 2\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为y是通过一个操作创建的,所以它有grad_fn,而x是由用户创建,所以它的grad_fn为None."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<AddBackward0 object at 0x000001E020B794A8>\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "print(y.grad_fn)\n",
    "print(x.grad_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在y上执行操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[27., 27.],\n",
      "        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)\n"
     ]
    }
   ],
   "source": [
    "z = y * y * 3\n",
    "out = z.mean()\n",
    "\n",
    "print(z, out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.requires\\_grad_(...)`就地更改现有的Tensor的`requires_grad`标志。 如果没有给出，输入标志默认为False。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "True\n",
      "<SumBackward0 object at 0x000001E020B79FD0>\n"
     ]
    }
   ],
   "source": [
    "a = torch.randn(2, 2)\n",
    "a = ((a * 3) / (a - 1))\n",
    "print(a.requires_grad)\n",
    "a.requires_grad_(True)\n",
    "print(a.requires_grad)\n",
    "b = (a * a).sum()\n",
    "print(b.grad_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 梯度(Gradients)\n",
    "\n",
    "现在我们来执行反向传播,`out.backward()`相当于执行`out.backward(torch.tensor(1.))`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "out.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "输出out对x的梯度d(out)/dx:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[4.5000, 4.5000],\n",
      "        [4.5000, 4.5000]])\n"
     ]
    }
   ],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "你应该得到一个值全为4.5的矩阵,我们把张量out称为\"$o$\". 则：$o = \\frac{1}{4}\\sum_i z_i$,${{z}_{i}}=3{{(x+2)}^{2}}$ ，并且 $z\\left| _{{{x}_{i}}=1} \\right.=27$  ，所以，$\\frac{\\partial o}{\\partial x_i} = \\frac{3}{2}(x_i+2)$, 因此$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} = \\frac{9}{2} = 4.5$\n",
    "在数学上，如果你有一个向量值函数$\\vec{y}=f(\\vec{x})$ ，则$\\vec{y}$相对于$\\vec{x}$的梯度是雅可比矩阵：\n",
    "\n",
    "\n",
    "$J=\\left( \\begin{matrix}\n",
    "   \\frac{\\partial {{y}_{1}}}{\\partial {{x}_{1}}} & \\ldots  & \\frac{\\partial {{y}_{m}}}{\\partial {{x}_{1}}}  \\\\\n",
    "   \\vdots  & \\ddots  & \\vdots   \\\\\n",
    "   \\frac{\\partial {{y}_{1}}}{\\partial {{x}_{n}}} & \\cdots  & \\frac{\\partial {{y}_{m}}}{\\partial {{x}_{n}}}  \\\\\n",
    "\\end{matrix} \\right)$\n",
    "\n",
    "一般来说，torch.autograd是一个计算雅可比向量积的引擎。 也就是说，给定任何向量$v =(v_1 v_2 ...v_m)^T$，计算乘积$J\\cdot v$。 如果$v$恰好是标量函数的梯度$l=g(\\vec{y})$，即$v={{(\\frac{\\partial l}{\\partial {{y}_{1}}}\\cdots \\frac{\\partial l}{\\partial {{y}_{m}}})}^{T}}$ 然后根据链式法则，雅可比向量乘积将是$l$相对于$\\vec{x}$的梯度\n",
    "\n",
    "$J\\centerdot v=\\left( \\begin{matrix}\n",
    "   \\frac{\\partial {{y}_{1}}}{\\partial {{x}_{1}}} & \\ldots  & \\frac{\\partial {{y}_{m}}}{\\partial {{x}_{1}}}  \\\\\n",
    "   \\vdots  & \\ddots  & \\vdots   \\\\\n",
    "   \\frac{\\partial {{y}_{1}}}{\\partial {{x}_{n}}} & \\cdots  & \\frac{\\partial {{y}_{m}}}{\\partial {{x}_{n}}}  \\\\\n",
    "\\end{matrix} \\right)\\left( \\begin{matrix}\n",
    "   \\frac{\\partial l}{\\partial {{y}_{1}}}  \\\\\n",
    "   \\vdots   \\\\\n",
    "   \\frac{\\partial l}{\\partial {{y}_{m}}}  \\\\\n",
    "\\end{matrix} \\right)=\\left( \\begin{matrix}\n",
    "   \\frac{\\partial l}{\\partial {{x}_{1}}}  \\\\\n",
    "   \\vdots   \\\\\n",
    "   \\frac{\\partial l}{\\partial {{x}_{n}}}  \\\\\n",
    "\\end{matrix} \\right)$\n",
    "\n",
    "雅可比向量积的这种特性使得将外部梯度馈送到具有非标量输出的模型中非常方便。\n",
    "\n",
    "现在让我们来看一个雅可比向量积的例子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([  384.5854,   -13.6405, -1049.2870], grad_fn=<MulBackward0>)\n"
     ]
    }
   ],
   "source": [
    "x = torch.randn(3, requires_grad=True)\n",
    "y = x * 2\n",
    "while y.data.norm() < 1000:\n",
    "    y = y * 2\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在在这种情况下，y不再是标量。 `torch.autograd`无法直接计算完整雅可比行列式，但如果我们只想要雅可比向量积，只需将向量作为参数向后传递："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])\n"
     ]
    }
   ],
   "source": [
    "v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)\n",
    "y.backward(v)\n",
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "您还可以通过torch.no_grad()代码，在张量上使用.requires_grad = True来停止使用跟踪历史记录。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "print(x.requires_grad)\n",
    "print((x ** 2).requires_grad)\n",
    "\n",
    "with torch.no_grad():\n",
    "    print((x ** 2).requires_grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关于`autograd`和`Function`的文档在http://pytorch.org/docs/autograd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "本章的官方代码：\n",
    "* Python：[autograd_tutorial.py](download/autograd_tutorial.py)\n",
    "* Jupyter notebook:[autograd_tutorial.ipynb](download/autograd_tutorial.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
